Skip to content

[LifetimeSafety] Refactor FactGenerator to use RecursiveASTVisitor #153661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

usx95
Copy link
Contributor

@usx95 usx95 commented Aug 14, 2025

This PR refactors the FactGenerator class to use RecursiveASTVisitor for traversing statements, which helps avoid duplicate fact generation for nodes that appear multiple times in the AST. The implementation:

  1. Introduces a new FactGeneratorDriver class that uses RecursiveASTVisitor to traverse statements
  2. Simplifies FactGenerator by removing the run() method and moving block handling logic to the driver
  3. Adds tracking of visited statements to prevent duplicate processing
  4. Improves handling of loans in the test helper to support multiple loans per variable
  5. Adds a test case to verify that implicit casts don't create duplicate loans

Fixes #153592

Copy link
Contributor Author

usx95 commented Aug 14, 2025

@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from e61817a to 43d28dc Compare August 14, 2025 20:07
@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from 43d28dc to 1877937 Compare August 14, 2025 20:33
Copy link

github-actions bot commented Aug 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from 1877937 to f3ff755 Compare August 14, 2025 21:16
@usx95 usx95 changed the title [LifetimeSafety] Track view types/gsl::Pointer. [LifetimeSafety] Prevent duplicate loans and statement visits Aug 14, 2025
@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from f3ff755 to cdc9849 Compare August 14, 2025 21:17
@usx95 usx95 force-pushed the users/usx95/07-20-basic_error_report_for_use_after_free branch from f590db6 to 894fa21 Compare August 14, 2025 21:17
@usx95 usx95 requested review from Xazax-hun and ymand August 14, 2025 21:19
@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from cdc9849 to 3bb7fca Compare August 14, 2025 21:20
@usx95 usx95 marked this pull request as ready for review August 14, 2025 21:21
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:analysis labels Aug 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 14, 2025

@llvm/pr-subscribers-clang-analysis

Author: Utkarsh Saxena (usx95)

Changes

This change adds a VisitedStmts set to avoid processing the same statement multiple times (specially through VisitImplicitCastExpr) leading to issuing multiple loans for the same expression!

Fixes #153592


Full diff: https://github.com/llvm/llvm-project/pull/153661.diff

2 Files Affected:

  • (modified) clang/lib/Analysis/LifetimeSafety.cpp (+8)
  • (modified) clang/unittests/Analysis/LifetimeSafetyTest.cpp (+32-21)
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
index c762f63c45e09..86d9517dde45c 100644
--- a/clang/lib/Analysis/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -396,6 +396,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
     // initializations and destructions are processed in the correct sequence.
     for (const CFGBlock *Block : *AC.getAnalysis<PostOrderCFGView>()) {
       CurrentBlockFacts.clear();
+      VisitedStmts.clear();
       for (unsigned I = 0; I < Block->size(); ++I) {
         const CFGElement &Element = Block->Elements[I];
         if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
@@ -408,6 +409,12 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
     }
   }
 
+  void Visit(const Stmt *S) {
+    if (!S || !VisitedStmts.insert(S).second)
+      return;
+    Base::Visit(S);
+  }
+
   void VisitDeclStmt(const DeclStmt *DS) {
     for (const Decl *D : DS->decls())
       if (const auto *VD = dyn_cast<VarDecl>(D))
@@ -551,6 +558,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
   FactManager &FactMgr;
   AnalysisDeclContext &AC;
   llvm::SmallVector<Fact *> CurrentBlockFacts;
+  llvm::DenseSet<const Stmt *> VisitedStmts;
 };
 
 // ========================================================================= //
diff --git a/clang/unittests/Analysis/LifetimeSafetyTest.cpp b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
index c8d88b4ea2277..13e5832d70050 100644
--- a/clang/unittests/Analysis/LifetimeSafetyTest.cpp
+++ b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
@@ -11,6 +11,7 @@
 #include "clang/ASTMatchers/ASTMatchers.h"
 #include "clang/Testing/TestAST.h"
 #include "llvm/ADT/StringMap.h"
+#include "llvm/Testing/Support/Error.h"
 #include "gmock/gmock.h"
 #include "gtest/gtest.h"
 #include <optional>
@@ -20,6 +21,7 @@ namespace clang::lifetimes::internal {
 namespace {
 
 using namespace ast_matchers;
+using ::testing::SizeIs;
 using ::testing::UnorderedElementsAreArray;
 
 // A helper class to run the full lifetime analysis on a piece of code
@@ -96,21 +98,18 @@ class LifetimeTestHelper {
     return OID;
   }
 
-  std::optional<LoanID> getLoanForVar(llvm::StringRef VarName) {
+  std::vector<LoanID> getLoansForVar(llvm::StringRef VarName) {
     auto *VD = findDecl<VarDecl>(VarName);
-    if (!VD)
-      return std::nullopt;
+    if (!VD) {
+      ADD_FAILURE() << "No VarDecl found for '" << VarName << "'";
+      return {};
+    }
     std::vector<LoanID> LID = Analysis.getLoanIDForVar(VD);
     if (LID.empty()) {
       ADD_FAILURE() << "Loan for '" << VarName << "' not found.";
-      return std::nullopt;
-    }
-    // TODO: Support retrieving more than one loans to a var.
-    if (LID.size() > 1) {
-      ADD_FAILURE() << "More than 1 loans found for '" << VarName;
-      return std::nullopt;
+      return {};
     }
-    return LID[0];
+    return LID;
   }
 
   std::optional<LoanSet> getLoansAtPoint(OriginID OID,
@@ -121,13 +120,12 @@ class LifetimeTestHelper {
     return Analysis.getLoansAtPoint(OID, PP);
   }
 
-  std::optional<llvm::DenseSet<LoanID>>
+  std::optional<std::vector<LoanID>>
   getExpiredLoansAtPoint(llvm::StringRef Annotation) {
     ProgramPoint PP = Runner.getProgramPoint(Annotation);
     if (!PP)
       return std::nullopt;
-    auto Expired = Analysis.getExpiredLoansAtPoint(PP);
-    return llvm::DenseSet<LoanID>{Expired.begin(), Expired.end()};
+    return Analysis.getExpiredLoansAtPoint(PP);
   }
 
 private:
@@ -197,12 +195,13 @@ MATCHER_P2(HasLoansToImpl, LoanVars, Annotation, "") {
 
   std::vector<LoanID> ExpectedLoans;
   for (const auto &LoanVar : LoanVars) {
-    std::optional<LoanID> ExpectedLIDOpt = Info.Helper.getLoanForVar(LoanVar);
-    if (!ExpectedLIDOpt) {
+    std::vector<LoanID> ExpectedLIDs = Info.Helper.getLoansForVar(LoanVar);
+    if (ExpectedLIDs.empty()) {
       *result_listener << "could not find loan for var '" << LoanVar << "'";
       return false;
     }
-    ExpectedLoans.push_back(*ExpectedLIDOpt);
+    ExpectedLoans.insert(ExpectedLoans.end(), ExpectedLIDs.begin(),
+                         ExpectedLIDs.end());
   }
 
   return ExplainMatchResult(UnorderedElementsAreArray(ExpectedLoans),
@@ -221,17 +220,17 @@ MATCHER_P(AreExpiredAt, Annotation, "") {
                      << Annotation << "'";
     return false;
   }
-  std::vector<LoanID> ActualExpiredLoans(ActualExpiredSetOpt->begin(),
-                                         ActualExpiredSetOpt->end());
+  std::vector<LoanID> ActualExpiredLoans = *ActualExpiredSetOpt;
   std::vector<LoanID> ExpectedExpiredLoans;
   for (const auto &VarName : Info.LoanVars) {
-    auto LoanIDOpt = Helper.getLoanForVar(VarName);
-    if (!LoanIDOpt) {
+    auto LoanIDs = Helper.getLoansForVar(VarName);
+    if (LoanIDs.empty()) {
       *result_listener << "could not find a loan for variable '" << VarName
                        << "'";
       return false;
     }
-    ExpectedExpiredLoans.push_back(*LoanIDOpt);
+    ExpectedExpiredLoans.insert(ExpectedExpiredLoans.end(), LoanIDs.begin(),
+                                LoanIDs.end());
   }
   return ExplainMatchResult(UnorderedElementsAreArray(ExpectedExpiredLoans),
                             ActualExpiredLoans, result_listener);
@@ -730,5 +729,17 @@ TEST_F(LifetimeAnalysisTest, ReassignedPointerThenOriginalExpires) {
   EXPECT_THAT(LoansTo({"s1", "s2"}), AreExpiredAt("p_after_s1_expires"));
 }
 
+TEST_F(LifetimeAnalysisTest, NoDuplicateLoansForImplicitCastToConst) {
+  SetupTest(R"(
+    void target() {
+      MyObj a;
+      const MyObj* p = &a;
+      const MyObj* q = &a;
+      POINT(at_end);
+    }
+  )");
+  EXPECT_THAT(Helper->getLoansForVar("a"), SizeIs(2));
+}
+
 } // anonymous namespace
 } // namespace clang::lifetimes::internal

@llvmbot
Copy link
Member

llvmbot commented Aug 14, 2025

@llvm/pr-subscribers-clang

Author: Utkarsh Saxena (usx95)

Changes

This change adds a VisitedStmts set to avoid processing the same statement multiple times (specially through VisitImplicitCastExpr) leading to issuing multiple loans for the same expression!

Fixes #153592


Full diff: https://github.com/llvm/llvm-project/pull/153661.diff

2 Files Affected:

  • (modified) clang/lib/Analysis/LifetimeSafety.cpp (+8)
  • (modified) clang/unittests/Analysis/LifetimeSafetyTest.cpp (+32-21)
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
index c762f63c45e09..86d9517dde45c 100644
--- a/clang/lib/Analysis/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -396,6 +396,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
     // initializations and destructions are processed in the correct sequence.
     for (const CFGBlock *Block : *AC.getAnalysis<PostOrderCFGView>()) {
       CurrentBlockFacts.clear();
+      VisitedStmts.clear();
       for (unsigned I = 0; I < Block->size(); ++I) {
         const CFGElement &Element = Block->Elements[I];
         if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
@@ -408,6 +409,12 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
     }
   }
 
+  void Visit(const Stmt *S) {
+    if (!S || !VisitedStmts.insert(S).second)
+      return;
+    Base::Visit(S);
+  }
+
   void VisitDeclStmt(const DeclStmt *DS) {
     for (const Decl *D : DS->decls())
       if (const auto *VD = dyn_cast<VarDecl>(D))
@@ -551,6 +558,7 @@ class FactGenerator : public ConstStmtVisitor<FactGenerator> {
   FactManager &FactMgr;
   AnalysisDeclContext &AC;
   llvm::SmallVector<Fact *> CurrentBlockFacts;
+  llvm::DenseSet<const Stmt *> VisitedStmts;
 };
 
 // ========================================================================= //
diff --git a/clang/unittests/Analysis/LifetimeSafetyTest.cpp b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
index c8d88b4ea2277..13e5832d70050 100644
--- a/clang/unittests/Analysis/LifetimeSafetyTest.cpp
+++ b/clang/unittests/Analysis/LifetimeSafetyTest.cpp
@@ -11,6 +11,7 @@
 #include "clang/ASTMatchers/ASTMatchers.h"
 #include "clang/Testing/TestAST.h"
 #include "llvm/ADT/StringMap.h"
+#include "llvm/Testing/Support/Error.h"
 #include "gmock/gmock.h"
 #include "gtest/gtest.h"
 #include <optional>
@@ -20,6 +21,7 @@ namespace clang::lifetimes::internal {
 namespace {
 
 using namespace ast_matchers;
+using ::testing::SizeIs;
 using ::testing::UnorderedElementsAreArray;
 
 // A helper class to run the full lifetime analysis on a piece of code
@@ -96,21 +98,18 @@ class LifetimeTestHelper {
     return OID;
   }
 
-  std::optional<LoanID> getLoanForVar(llvm::StringRef VarName) {
+  std::vector<LoanID> getLoansForVar(llvm::StringRef VarName) {
     auto *VD = findDecl<VarDecl>(VarName);
-    if (!VD)
-      return std::nullopt;
+    if (!VD) {
+      ADD_FAILURE() << "No VarDecl found for '" << VarName << "'";
+      return {};
+    }
     std::vector<LoanID> LID = Analysis.getLoanIDForVar(VD);
     if (LID.empty()) {
       ADD_FAILURE() << "Loan for '" << VarName << "' not found.";
-      return std::nullopt;
-    }
-    // TODO: Support retrieving more than one loans to a var.
-    if (LID.size() > 1) {
-      ADD_FAILURE() << "More than 1 loans found for '" << VarName;
-      return std::nullopt;
+      return {};
     }
-    return LID[0];
+    return LID;
   }
 
   std::optional<LoanSet> getLoansAtPoint(OriginID OID,
@@ -121,13 +120,12 @@ class LifetimeTestHelper {
     return Analysis.getLoansAtPoint(OID, PP);
   }
 
-  std::optional<llvm::DenseSet<LoanID>>
+  std::optional<std::vector<LoanID>>
   getExpiredLoansAtPoint(llvm::StringRef Annotation) {
     ProgramPoint PP = Runner.getProgramPoint(Annotation);
     if (!PP)
       return std::nullopt;
-    auto Expired = Analysis.getExpiredLoansAtPoint(PP);
-    return llvm::DenseSet<LoanID>{Expired.begin(), Expired.end()};
+    return Analysis.getExpiredLoansAtPoint(PP);
   }
 
 private:
@@ -197,12 +195,13 @@ MATCHER_P2(HasLoansToImpl, LoanVars, Annotation, "") {
 
   std::vector<LoanID> ExpectedLoans;
   for (const auto &LoanVar : LoanVars) {
-    std::optional<LoanID> ExpectedLIDOpt = Info.Helper.getLoanForVar(LoanVar);
-    if (!ExpectedLIDOpt) {
+    std::vector<LoanID> ExpectedLIDs = Info.Helper.getLoansForVar(LoanVar);
+    if (ExpectedLIDs.empty()) {
       *result_listener << "could not find loan for var '" << LoanVar << "'";
       return false;
     }
-    ExpectedLoans.push_back(*ExpectedLIDOpt);
+    ExpectedLoans.insert(ExpectedLoans.end(), ExpectedLIDs.begin(),
+                         ExpectedLIDs.end());
   }
 
   return ExplainMatchResult(UnorderedElementsAreArray(ExpectedLoans),
@@ -221,17 +220,17 @@ MATCHER_P(AreExpiredAt, Annotation, "") {
                      << Annotation << "'";
     return false;
   }
-  std::vector<LoanID> ActualExpiredLoans(ActualExpiredSetOpt->begin(),
-                                         ActualExpiredSetOpt->end());
+  std::vector<LoanID> ActualExpiredLoans = *ActualExpiredSetOpt;
   std::vector<LoanID> ExpectedExpiredLoans;
   for (const auto &VarName : Info.LoanVars) {
-    auto LoanIDOpt = Helper.getLoanForVar(VarName);
-    if (!LoanIDOpt) {
+    auto LoanIDs = Helper.getLoansForVar(VarName);
+    if (LoanIDs.empty()) {
       *result_listener << "could not find a loan for variable '" << VarName
                        << "'";
       return false;
     }
-    ExpectedExpiredLoans.push_back(*LoanIDOpt);
+    ExpectedExpiredLoans.insert(ExpectedExpiredLoans.end(), LoanIDs.begin(),
+                                LoanIDs.end());
   }
   return ExplainMatchResult(UnorderedElementsAreArray(ExpectedExpiredLoans),
                             ActualExpiredLoans, result_listener);
@@ -730,5 +729,17 @@ TEST_F(LifetimeAnalysisTest, ReassignedPointerThenOriginalExpires) {
   EXPECT_THAT(LoansTo({"s1", "s2"}), AreExpiredAt("p_after_s1_expires"));
 }
 
+TEST_F(LifetimeAnalysisTest, NoDuplicateLoansForImplicitCastToConst) {
+  SetupTest(R"(
+    void target() {
+      MyObj a;
+      const MyObj* p = &a;
+      const MyObj* q = &a;
+      POINT(at_end);
+    }
+  )");
+  EXPECT_THAT(Helper->getLoansForVar("a"), SizeIs(2));
+}
+
 } // anonymous namespace
 } // namespace clang::lifetimes::internal

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this is the right approach. If everything works out well, every time we call Visit on an expression, there should be a guarantee we already visited all the subexpressions of it (modulo some corner cases with short circuiting operators, ternaries and trivially false branches). So, we might be able to structure the code in a way that we never need to call Visit recursively for a subexpression, and we do not need to keep a VisitedStmts set.

But in case that does not work out for some reason I am also fine with this approach.

@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from 3bb7fca to 546483e Compare August 18, 2025 10:30
Base automatically changed from users/usx95/07-20-basic_error_report_for_use_after_free to main August 18, 2025 11:46
@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from 546483e to f11f942 Compare August 18, 2025 11:48
Copy link
Contributor Author

usx95 commented Aug 18, 2025

If everything works out well, every time we call Visit on an expression, there should be a guarantee we already visited all the subexpressions of it

Yes this is the ideal scenario. But we visit the statements given by the CFG and this can, more often than not, skip many sub expressions. I don't have a great solution. An alternate could be to use a Recursive AST visitor (RAV) along that drives our FactGenerator stmt visitor. The RAV would still need to maintain a Visited set but FactGenerator visitor can now assume that all subexpressions have been visited without needing for each Visit* Method in FactGenerator to explicitly revisit subexpressions. WDYT about this alternative of using a RAV driver ?

@Xazax-hun
Copy link
Collaborator

The RAV would still need to maintain a Visited set

Could you elaborate on why?

WDYT about this alternative of using a RAV driver ?

Hmm, there is a change that this could be a bit simpler than the current solution. If you can give this a try to see where does that lead it would be awesome.

@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch 2 times, most recently from 5ccc210 to dbb99ed Compare August 18, 2025 15:56
Copy link
Contributor Author

usx95 commented Aug 18, 2025

The RAV would still need to maintain a Visited set

Could you elaborate on why?

The CFG can contain sub expressions as individual CFG statements. Later parent expressions can again appear as independent CFGStmt. It is not guaranteed that all subexpressions of all expr types appear as CFGStmt. So we recursively visit each CFGStmt and skip already visited Stmt which might have appeared before in CFG.

@usx95 usx95 requested a review from Xazax-hun August 18, 2025 15:59
@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from dbb99ed to 2f329e7 Compare August 18, 2025 16:02
@usx95 usx95 changed the title [LifetimeSafety] Prevent duplicate loans and statement visits [LifetimeSafety] Refactor FactGenerator to use RecursiveASTVisitor Aug 18, 2025
Copy link
Contributor Author

usx95 commented Aug 19, 2025

WDYT about this alternative of using a RAV driver ?

Hmm, there is a change that this could be a bit simpler than the current solution. If you can give this a try to see where does that lead it would be awesome.

Switched to using RAV for driving the FactGenerator @Xazax-hun

@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from 2eac6ae to 3b3b93b Compare August 19, 2025 12:08
@usx95 usx95 force-pushed the users/usx95/08-14-_lifetimesafety_track_view_types_gsl_pointer branch from 3b3b93b to 440c3fe Compare August 19, 2025 12:16
@Xazax-hun
Copy link
Collaborator

Thanks! I am not actually sure if this one is simpler than the original version. Sorry about that. I don't have a strong feeling which one to pick. I guess the complexity here is an inevitable consequence of our CFGs being just an overlay on top of the AST and it is not entirely clear when should we use one or the other for traversals because it depends on what we are processing.

Let me know which one do you prefer (this or the previous), and I am happy to approve either of them.

Copy link
Contributor Author

usx95 commented Aug 19, 2025

This does introduce an extra layer of orchestration but I like the current approach. The FactGeneratorVisitor can now safely assume that all sub expressions must have been visited and does not have to case-by-case handle it during each Visit* methods. I think this becomes more and more important as we begin to handle more expression types.

I guess the complexity here is an inevitable consequence of our CFGs being just an overlay on top of the AST and it is not entirely clear when should we use one or the other for traversals because it depends on what we are processing.

+1

@Xazax-hun
Copy link
Collaborator

Thanks!

I have one more question. So we need the reverse post order property of the CFG to ensure that we visit everything in the right order. The only reason why I am a bit unsure here, is because I wonder if it can ever cause some trouble that the RAV visiting the subexpressions is crossing a basic block boundary and venturing into that block too early, before all of its predecessors have been visited.

If this cannot happen, I think the current approach should be fine.

@usx95 usx95 added the clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) label Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LifetimeSafety] Two loans created from the same expression
3 participants